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Abstract — We introduce a simply stated conjecture regarding 
the maximum information a Boolean function can reveal about 
noisy inputs. While the conjecture remains open, we provide 
substantial evidence supporting its validity. 

I. Introduction 
This paper is inspired by the following conjecture: 

Conjecture 1. Let X n be i.i.d. Bernoulli(l/2), and let Y n 
be the result of passing X n through a memoryless binary 
symmetric channel with crossover probability a. For any 
Boolean function b : {0, 1}™ — > {0, 1}, we have 



I(b(X n );Y n ) < 1-H(a). 



(1) 



At first sight, Conjecture [T] might appear suitable as a 
homework exercise for a first course on information theory. 
However, over the course of this paper, we hope to con- 
vince the reader that the conjecture is much deeper than it 
appears. Despite its apparent simplicity, standard information- 
theoretical manipulations appear incapable of establishing Q. 

To the present authors, Conjecture[T]represents the simplest, 
nontrivial embodiment of Boolean functions in an information- 
theoretic context. In words, Conjecture [T] asks: "What is the 
most significant bit that X n can provide about Y n ?" 

Despite their fundamental roles in computer science and 
digital computation, Boolean functions have received relatively 
little attention from the information theory community. The 
recent work (TJ is perhaps most relevant to our Conjecture 
[T] and provides compelling motivation for its study. In JT], 
the authors prove that for n and £P{b(X n ) = 0} > 1/2 
fixed, I(b(X n ); Xi) is maximized by functions b which satisfy 
b(X n ) = whenever X\ — (i.e., when b is canalizing 
in Xi). The motivation for considering this problem comes 
from computational biology, where Boolean networks are used 
to model dependencies in various regulatory networks. We 
encourage the reader to refer to (T), (2j and the references 
therein for further information. 

The conjecture is also related to the Information Bottleneck 
Method p), which attempts to solve the optimization problem 



min I(X n ;U)-XI(Y n ;U). 



(2) 



For a given A > 0, the optimizing U is purportedly the best 
tradeoff between the accuracy of describing Y n and the de- 
scriptive complexity of U. In our setting, b(X n ) plays the role 
of U, and we constrain the descriptive complexity to be at most 
one bit. It is relatively easy to show that randomized Boolean 



functions do not yield a higher mutual information. Thus, 
expressing Conjecture [T] in terms of deterministic boolean 
functions comes without loss of generality. 

A more concrete example comes in the context of gambling. 
To this end, suppose Y n is a simple model for a market of 
n stocks, where each stock doubles in value or goes bankrupt 
with probability 1/2, independent of all other stocks. If an 
oracle has access to side information X n , and we are allowed 
to ask one yes/no question of the oracle, which question should 
we ask to maximize the rate at which our wealth grows? The 
validity of Conjecture 1 would imply that we should only 
concern ourselves with the performance of a single stock; say 
Yi. This is readily seen as a consequence of known results 
on gambling with side information |4] Theorem 6.2.1], since 
putting b(X n ) = Xi yields 

I(b(X n ); Y n ) = I(X i; Y n ) = I(X i; n) = 1 - H(a), (3) 

hence the conjectured upper bound ((TJ is attainable and 
represents the maximum possible increase in doubling rate. We 
note that Erkip's Ph.D. thesis |5| has addressed the problem 
of gambling with side information, and the tools developed 
therein can be used to show the weaker bound 



I(b(X n );Y n )<pl i (X 1 ,Y 1 ) = (l~2ay< 



(4) 



where p m (Xi, Yi) is the Hirschfeld-Gebelein-Renyi maximal 
correlation between random variables X\ and Y±. Here, we 
pause to remark that the Fourier analytic techniques employed 
in (TJ appear to only yield Q, and therefore seem insufficient 
for our purposes. 

Finally, we point out that ([Tji is related in spirit to the 
notion of average sensitivity of Boolean functions. This topic 
has received a great deal of attention in the computer science 
literature (cf. |6)). To see the connection to sensitivity, note 
that ([T]) can be rewritten as 



H(b(X n )\Y n ) > H(b(X n )) - 1 + H(a). 



(5) 



For fixed &>{b(X n ) = 0}, the right hand side of ^ is 
constant. Hence, the conjecture essentially lower bounds the 
output uncertainty of Boolean functions with respect to noisy 
inputs. 

This paper is organized as follows. Section [II] provides a 
summary of the main results and their implications. It includes 
a refinement of Conjecture [T] by splitting it into two "sub- 
conjectures." The following section deals with the proofs of 



the main results. Section IV delivers concluding remarks. 



II. Results and Implications 

Let X n be a sequence of i.i.d. Bernoulli (1/2) random 
variables, Z n be a sequence of i.i.d. Bernoulli (a) random vari- 
ables independent ofl n ,0<a< 1/2. Let Y n = X n ® Z n , 
where "©" denotes coordinate-wise XOR. Throughout, we let 
il = {0, 1}, f2„ = {0, 1}™, and consider Boolean functions 
b : O n O. 

Definition 1. The lexicographical ordering -<l on {0, l} k is 

defined as follows: x k -<l x k iff Xj < Xj for some j and 
Xi = ii for all i < j. 

For example, if k = 3, we have 000 < L 001 < L 010 < L 
011 < L 100 < L 101 < L 110 -< L 111. 

Definition 2. We define L^(M) to be the initial segment of size 
M in the lexicographical ordering on {0, l} fc . For example, 
L 3 (4) = {000,001,010,011}. 

For a function b : O n — > il, we say that "b is lex" when 
6 _1 (0) = i„(|6 _1 (0)|). In other words, b is lex when it maps 
an initial segment of the lexicographical order to 0, and the 
complement segment to 1. 

Instead of dealing with Conjecture [T] directly, consider the 
following two conjectures: 

Conjecture 2. For a given n and fixed cardinality \b~ (0)|, 
the conditional entropy H(b(X n )\Y n ) is minimized when b is 
lex. 



define the Z-section of A at x n by 

A x (x n ) = {z k :y n &A, yi -- 



Conjecture 3. If b is lex, then 



H(b(X n )\Y n ) > H(b(X n ))H{a) 



(6) 



Clearly, Conjecture [TJ would follow as a corollary if Con- 
jectures [2] and [3] were valid. 

Referring to Conjecture [3] as a "conjecture" is perhaps too 
modest. Indeed, we give a computer aided proof of |6]) for 
a ranging from to 1/2 in increments of 0.001. Refer to 
Theorem [3] in Section Ull-B I for details. 

Conjecture|2]is reminiscent of a classical theorem in discrete 
mathematics originally due to Harper [7] that gives an exact 
edge-isoperimetric inequality for the hypercube. To state the 
theorem, we need a few basic notations. Let Q n be the n- 
dimensional hypercube, and let V(Q n ) — il n be its set of 
vertices. For 5* C V(Q n ), the edge boundary d(S) is the set 
of edges one has to delete to disconnect S from any vertex 
not in S. 

Theorem 1. For S C V(Q„) with \S\ = k, we have \d{S)\ > 
\d(L n (k))\. 

The simplest proofs of Theorem [T] rely on so-called com- 
pression operators, popularized by Bollobas and Leader JS). 
These compression operators turn out to be useful in making 
progress towards Conjecture |2j so we now introduce them. 

Let I be subset of {1, 2, ...,n} of cardinality k. To be 
concrete, let I = {ii, ■ ■ ■ , «fe}, where i± < ii < • • • < 
For a set A C il n and x" having xi — for all i £ I, we 



Zj if i = ij € I 
Xi otherwise 



For instance, if A = {000, 001, 011, 101}, then examples of 
Z-sections at different x 3 € are given by: 



^(001) = {0,1}, 

A {2} (100) =0, 
A {1 , 2} (000) = {00}, 
A {lj2} (001) = {00,01,11}. 



(7) 
(8) 
(9) 
(10) 



The Z-compression of A, Cz(A), is defined in terms of its 
Z-sections 

(C X (A)) X (x n ) = L k (\A x (x n )\). 

In other words, C x replaces each Z-section of A with an initial 
segment of the lexicographical order. We say that A is Z- 
compressed if C X (A) = A. Note that C X (A) is always Z- 
compressed. 

Continuing the above example of A = {000, 001, 011, 101}, 
several different Z-compressions are given by: 

C7 {1} (.4) = {000,001,011,001}, (11) 

C {2} (A) = {000,001,011,101}, (12) 

C{2 ,3}{A) = {000, 001, 010, 100}, (13) 

C { i , 3} (A) = {000, 001, 010, 100}. (14) 

We pause to make two important observations. First, Z- 
compression preserves the size of the set on which it operates. 
That is, |Cx(^4)| = \A\. Second, if A is Z-compressed, then 
it is also ,7-compressed for all Jcl. 

The following theorem states that when |Z| = 2, applying 
an Z-compression to & _1 (0) does not decrease the informa- 
tion b(X n ) reveals about Y n . Thus, compression provides 
a method of modifying functions in a manner that does not 
adversely affect the mutual information I(b{X n )\Y n ). 

Theorem 2. Let b : f2„ — > il and let I C {1, . . . ,n} satisfy 
|Z| = 2. If b : il n — > Q is defined by its preimage 6 _1 (0) = 
Qr(6 -1 (°))- then I(b(X n );Y n ) > I(b(X n );Y n ). 

By definition, if C x (-) changes an element of 6 _1 (0), 
it moves it lower in the lexicographical ordering on Q, n . 
Therefore, one can repeatedly apply Theorem [2] for different 
subsets Z of cardinality 2, ultimately terminating at a function 
b which is Z-compressed for all Z with |Z| < 2. Hence, we 
have the following corollary. 

Corollary 1. Let S n be the set of functions b : Q n — > Q 
for which b~ 1 (0) is X-compressed for all Z with |Z| < 2. In 
maximizing I(b(X n ); Y n ), it is sufficient to consider functions 
b G S n . 

The implications of Theorem [2] and its corollary are twofold. 
First, it allows the verification of Conjecture [2] for modest val- 
ues of n. Indeed, we have numerically validated Conjectures 
[T] and [2] for n < 7 by evaluating I(b(X n );Y n ) for b e S n . 
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10 
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10 


256 


4 


25 


65,536 


5 


119 


4.3 x 10 9 


6 


1173 


1.8 x 10 19 


7 


44,315 


3.4 x 10 38 



TABLE I: Reduction in number of candidate Boolean func- 
tions to be considered for verification of Conjecture |2| 



To appreciate the reduction afforded by Corollary [T[ define 
B n to be the set of all 2 2 Boolean functions on n inputs. A 
comparison between |6>„| and \B n \ is given in Table [I] 

Second, Theorem [2] reinforces the intuition behind Con- 
jecture [2] As we noted above, if Cz(-) changes an element 
of 6 _1 (0), it moves it lower in the lexicographical ordering 
on Thus, roughly speaking, applying I-compression to 
6 _1 (0) yields a function b which is (i) closer to an initial 
segment of the lexicographical order, and (ii) for \I\ < 2 
satisfies H(b(X n )\Y n ) < H(b(X n )\Y n ). 

Ideally, Theorem [2] should generalize to include I- 
compressions for \I\ > 2. Indeed, if we could take \I\ = n, 
Conjecture [2] would be proved. However, we have found 
counterexamples where compression increases H(b(X n )\Y n ) 
for |J| > 2 (but still reduces H(b(X n )\Y n ) for \T\ = n). We 
omit the details. 



III. Proofs 

A. Proof of Theorem [2] 

We begin the proof of Theorem [2] by first proving the 
following result for 1 -dimensional compressions. 

Lemma 1. Let b : fl n — > fi and i 6 {1,2, ... ,n}. If b : 
Q, n — > f2 is defined by its preimage 6 _1 (0) = C{i} 1 (0)), 
then I(b(X n );Y n ) > I(b(X n ); Y n ). 

Proof: It suffices to consider the case where i — n, as 
any other case can be handled by first permuting coordinates. 

First note that we can assume ^{b(X n ) = Oly"" 1 , 0} > 
&>{b(X n ) = 0\y n ~ 1 , 1} for a fixed y n ~ x . To see this, define 
A = {i"" 1 : (x" _x ,0) e & _1 (0)} and A x = {x n - x : 
(x n ,1) G fe _1 (0)}. By considering the function b, or 
equivalently the function b'(x n ) := b{x n ~ 1 ,^x n ), we can 
guarantee that 



Then note that, since a < 1/2, and 
&{b(X n )=Q\y n -\Q} = {l-a) ]T PC^V" 1 ) 

+ a J2 p{x n - l \v n - x ) 

x n ~ 1 eA 1 

&{b(X n ) = 0\y n -\l} = a E p(*"~V _1 ) 

x n - 1 eA Q 

+ (l-o) E pix^W 1 - 1 ), 
x n ~ 1 eAi 

we have ^{&(Jf n ) = 0|y"-\0} > .^{b{X n ) = 0|y n -\l} 
as desired. 

Next, define B = fc-^O) and Jr^O) = C {n} {B), and let 
E={x n - 1 :B {n} (x n - 1 ,0) = {l}}. (16) 

Then, 

&>{b(X n ) = Oil/"- 1 , 0} = ^{&(X n ) = OIj/" -1 , 0} 

+ (1 - 2a)^{X"- 1 € E^ 1 - 1 } 

0>{b{X n ) = 0\y n -\ 1} = ,^{b(X n ) = Oly"" 1 , 1} 

- (1 - 2a)^{X"~ 1 e W" 1 }- 

It is readily verified that {^{b(X n ) 
0I?/"- 1 , 0}, &>{b(X n ) = Oly"- 1 , 1}} is majorized by 
{&>{b{X n ) = Oly™- 1 ^},^!^^ 1 ) = Q^ 1 ,!}}, and 
hence Schur-concavity of entropy implies 

H(b(X n )\y n -\Y n ) (17) 

= 1 -h 2 (&{Hx n ) = o\y n -\Q}) 

+ l -h 2 {<?{b(X n )=0\y n -\l}) (18) 

<^ 2 (^{&(X")=0|y"-\0}) 

+ i/ ) , 2 (^{6(X")=0| 2 /"- 1 ,l}) (19) 
= H(6(X")| 2/ "- 1 ,F„), (20) 

where h^ix) = — xlog 2 (a;) — (1 — x)log 2 (l — x) is the 
binary entropy function. Now, we average ( fTTj ) and ( |20| ) 
over all values of y"" 1 to conclude that H(b(X n )\Y n ) < 
H (b(X n )\Y n ). To complete the proof, we recall that 
|5 _1 (0)| = \C{ n }{b- l {Q))\ = |& _1 (0)|- Combined with the 
fact that X n is uniformly distributed on 57„, this implies that 
H{b{X n )) = H(b(X n )), as desired. ■ 

We are now in a position to finish the proof of Theorem [2] 
Proof of Theorem |2| We assume that X = {n — 1, n}, 
as all other cases follow by a permutation of coordinates. To 
simplify notation, we write B = 6 _1 (0). 

By a repeated application of Lemma [T] we can assume 
that B is {n — 1}- and {n}-compressed. Thus, the I-sections 
B x (x n ) can only be one of the following: 0, {00}, {00,01}, 
{00,10}, {00,01,10}, or {00,01,10,11}. Note that all of 
these sets are initial segments of the lexicographical order on 



f2 2 except {00, 10}. Hence, we aim to transform B so that 
B T (x n ) ^ {00, 10}. To this end, define 

E = {x 11 : x„_! = x n = 0, B {x} (x n ) = {00, 10}} (21) 
E l = {x n : = z„ = 0,S {I} (x") = {00,01}} . (22) 

Now, define 6 by & _1 (0) = Cz(B) and the function b by 
permuting the last two coordinates: 



b{x T 



, X n , X n — i J 



5 %n— 1 



(23) 



0|j/ n }, (24) 



(25) 



It is relatively simple to show that 

&>{b(X n ) = 0|y"} = 
60>{b{X n ) = 0|y n } + (1 - 9)&>{b(X n ) 
where 

^{X"- 2 e^i|t/"- 2 } 

^{x»- 2 e ^oly™ -2 } + &>{x n - 2 e E^y"- 2 }' 

Concavity of entropy implies that 

6H(b(X n )\y n ) + (1 - 9)H(b(X n )\y n ) < H(b(X n )\y n ). 

Noting that 6 only depends on y n ~ 2 , we average both sides 
over y n -i,y n to obtain 

eH(b{X n )\y n - 2 ,YZ_ x ) + (1 - 9)H(b(X n )\y n - 2 

<W)|r 2 ,Ci)- 

Crucially, the symmetry ( |23] l implies that 

H(b(X n )\y n - 2 , y»_ x ) = i/(&(X")|y"- 2 , y^). (27) 



y™ i 

1 - r n-lJ 



(26) 



Combining this with ( |26] > and averaging over y n 2 proves 
H(b{X n )\Y n ) < H{b{X n )\Y n ). Since H(b{X n )) = 
H(b(X n )), the proof is complete. ■ 

B. An Algorithmic Proof of Conjecture [i] 

In this section, we work toward establishing Conjecture 
[3] Unless otherwise specified, all Boolean functions in this 
section are assumed to be lex. 

Define f(x) = —x log a:. Note that if b is lex, then so is -^b 
(i.e., the negation of 6). Therefore, to prove (BJ, it is sufficient 
to prove 

E Y ™f(^{b(X n ) = 0\Y n }) 

>f(<?{b(X n ) = 0})H(a). (28) 

To simplify notation, for a dyadic rational p = k/2 n , define 

7» 4 E Y nf (&{b(X n ) - 0|y"}) , (29) 

where 6 is the unique lex function on n inputs with 
8P{b{X n ) — 0} = p. Note that if k is even, b does not 
depend on its input bit x n . Therefore, T(p) is well-defined 
for all dyadic rationals in [0, 1]. Thus, the validity of ( |28] > for 
all lex b (and all n) is equivalent to 



T(p)>f(p)H(a). 



(30) 



After noting some basic properties of T(p), the inequality 
([30j> can be proved by a simple, recursive computation. This 
is captured by the following theorem. 




Fig. 1: A comparison of T(p) and f(p)H(a) for a = 0.1. 
The broken line shows the three chords Algorithm |III. 1 1 checks 
before terminating. 



Theorem 3. Fix a e (0,1/2). If a call to Algorithm III.l 
with arguments (p-,p+) = (1/2,1) eventually terminates, 
then Conjecture [i] is true for the chosen a. 



Algorithm III.l: TestInequality(p_,p + ) 



main 



if CheckChord(p_,p + ) < 

(p<- l(P- +P+) 
then < TestInequality(p_,p) 

[ TestInequality(p, p + ) 



procedure CheckChord(ci, b) 

C(x) is the chord connecting 
comment: ^ points and (p,T(b)). 

v <- min^g^^j C(x) - f(x)H(a) 
return (V) 



Remark 1. In the subroutine CHECKCHORDfa, b) of Algo- 
rithm [7?Q] the minimization has a closed form solution. 



Using a Matlab implementation of Algorithm |III.1| we have 
validated ([30| for a ranging from to 1/2 in increments of 
0.001. Hence, we are confident that Conjecture [3] is true in 
general. 



Unfortunately, the computations required by Algorithm III. 1 
can be tedious, and therefore computer assistance is generally 
needed. Figure [T] illustrates the chords that Algorithm III. 1 
tests to verify the validity of ([30]) for a — 0.1. 

Despite the apparent gap between T(p) and f(p)H(a) for 
p G (1/2,1) (e.g., Fig. [I]), the oscillatory behavior of T(p) 
seems to render traditional analysis techniques ineffective 
in establishing ( f30] >. This was our motivation for pursuing 
an algorithmic proof. To get a sense for the strange be- 
havior of T(p), we point out that it is possible to show 



that lim Q _j.o T(p)/H(a) is equal to the Takagi function, a 
classical construction of an everywhere-continuous, nowhere- 
differentiable function which is closely related to Theorem [T] 
(cf. |(9)). We omit the details. 

1 ) Proof of Theorem [3| The proof of Theorem [3] requires 
the following lemmas. 



Lemma 2. In order to prove OO), it is sufficient to consider 

pe [1/2,1]. 



Proof: Suppose b and b' are both lex and satisfj^] 

p 4 &>{b'(X n ) = 0} = \^{b{X n - x ) = 0}. 



(31) 



Then we have / (2p) = 2f (p) + 2p and 

0»{lf(X n ) = 0\y n } = ^{X, = 0\ yi }^{b(X2) = 0|y 2 "}, 
which implies the functional relation: 

2T(p)=T(2p)+2pH(a). (32) 
It follows easily that 



T(2p)-f(2p)H(a) = T(p)-f(p)H(a) 



(33) 



Thus, the claim is proved. ■ 
Although T(p) is not concave, we are able to prove a 
pseudo-concavity characteristic of T(p). This is exploited in 
the following claim. 

Lemma 3. For k < 2 n , consider the lex functions fe_ and b + 
which satisfy 

p_±&{b-(X n )=0}=~ (34) 



p+ 4 0} 



2 n 
jfe + 1 



(35) 



If 6 € [0, 1] and 9p- + (1 — 0)p + is a dyadic rational, then 

T(9p_ + (1 - 9)p+) > 6T(p_) + (1 - 0)T( P+ ). 

Proof: Let b be the unique lex function on n + 1 inputs 
which satisfies 

2k + 1 _ 1 
2 n + 1 ~ 2 



&>{b(X n+1 ) = 0} = 



P- 



(36) 



By construction, we have 

3*{b{X n+1 ) = o|r n+1 } 
= ^{x 1 = o\Y 1 }^{b+(x^ +1 ) = o|r 2 n+1 } 

+ ^>{X l = 1|Y 1 }^'{&_(XJ +1 ) = 0|r 2 " +1 }. (37) 

Combining < [36] > and ( f37] > with the fact that f(x) is concave, 
we have 



T 



P- 



P+ 



>E 

+ 1 
1 

~ 2 



o|y 1 }/(^{6 + (x 2 "+ 1 ) = o|y 2 n+1 }) 



■E Y n +l ^{X 1 = 1|Y 1 }/(^{6_(X 2 "+ 1 ) = o|y 2 n+1 }) 



T(p_)+T(p + ) 



(38) 



'Any lex function with £?{b(X n ) = 0} = fc/2" can be reduced to a lex 
function on n — 1 inputs if k is even. 



A simple inductive argument completes the proof. ■ 
We are now in a position to prove Theorem [3] 

Proof of Theorem pj| Lemmas [2] and [3] imply that, in 
order to prove |30|, it is sufficient to construct a piecewise 
linear function g : [1/2, 1] — > [0, oo) satisfying the following 
properties: 

1) Each segment of g is a chord connecting the points 
(p_,T(p_)) and (p + ,T(p + )), where p_ and p + are of 
the form given in ((34) and ([35| for some fc, n. 

2) For [1/2,1], g(p)>f(p)H(a). 

By definition, Algorithm |III. 1 1 terminates only if it constructs 
such a function. ■ 

IV. Concluding Remarks 

Although Conjecture [T] remains open, we have provided 
substantial evidence in support of its validity. Indeed, our 
results suggest that Conjecture [2] is valid and we have an 
algorithmic proof establishing Conjecture [3] for any given 
value of a. Any complete proof of Conjectures [T] or [2] would 
be of significant interest, since it would likely require new 
methods which may be applicable in information theory and 
elsewhere (e.g., in proving discrete isoperimetric inequalities). 

We leave the reader with a weak form of Conjecture[T] which 
could provide insight. For Boolean functions b, b', does it hold 
that I(b(X n )-b'{Y n )) < 1 - H(a)l While this problem 
appears difficult in general, it is a simple exercise to show 
this is true when b(X n ) and b'(Y n ) are both Bernoulli(l/2). 
Intuitively, this should be the case for b, b' which maximize 
I{b{X n )-b'{Y n )). 
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